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Abstract 

Subsequent to the two rounds of whole-genome duplication that occurred in the common ancestor of vertebrates, a third 
genome duplication occurred in the stem lineage of teleost fishes. This teleost-specific genome duplication (TCD) is thought to 
have provided genetic raw materials for the physiological, morphological, and behavioral diversification of this highly speciose 
group. The extreme physiological versatility of teleost fish is manifest in their diversity of blood-gas transport traits, which reflects 
the myriad solutions that have evolved to maintain tissue O2 delivery in the face of changing metabolic demands and envir- 
onmental O2 availability during different ontogenetic stages. During the course of development, regulatory changes in blood-02 
transport are mediated by the expression of multiple, functionally distinct hemoglobin (Hb) isoforms that meet the particular 
02-transport challenges encountered by the developing embryo or fetus (in viviparous or oviparous species) and in 
free-swimming larvae and adults. The main objective of the present study was to assess the relative contributions of 
whole-genome duplication, large-scale segmental duplication, and small-scale gene duplication in producing the extraordinary 
functional diversity of teleost Hbs. To accomplish this, we integrated phylogenetic reconstructions with analyses of conserved 
synteny to characterize the genomic organization and evolutionary history of the globin gene clusters of teleosts. These results 
were then integrated with available experimental data on functional properties and developmental patterns of stage-specific 
gene expression. Our results indicate that multiple a- and /J-globin genes were present in the common ancestor of gars (order 
Lepisoteiformes) and teleosts. The comparative genomic analysis revealed that teleosts possess a dual set of TGD-derived globin 
gene clusters, each of which has undergone lineage-specific changes in gene content via repeated duplication and deletion events. 
Phylogenetic reconstructions revealed that paralogous genes convergently evolved similar functional properties in different 
teleost lineages. Consistent with other recent studies of globin gene family evolution in vertebrates, our results revealed evidence 
for repeated evolutionary transitions in the developmental regulation of Hb synthesis. 
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Introduction 

Evidence suggests that two successive rounds of whole- 
genome duplication that occurred early in vertebrate evolu- 
tion may have played an important role in the evolution of 
vertebrate-specific innovations (Holland et al. 1994; Meyer 
1998; Meyer and SchartI 1999; Shimeld and Holland 2000; 
Wada 2001; Hoegg and Meyer 2005; Wada and Makabe 
2006; Zhang and Cohn 2008; Van de Peer et al. 2009; 
Hoffmann, Opazo, and Storz 2012). Roughly 320-400 Ma, a 
third genome duplication occurred in the stem lineage of 
teleost fish (infraclass Teleostei) following divergence from 
nonteleost ray-finned fish (Amores et al. 1998, 2011; 
Postlethwait et al. 2000; Taylor et al. 2001, 2003; Van de 
Peer et al. 2003; Hoegg et al. 2004; Jaillon et al. 2004; Meyer 
and Van de Peer 2005; Kasahara et al. 2007; Sato and Nishida 
2010). The teleost-specific genome duplication (TCD) is 
thought to have provided raw materials for the physiological, 
morphological, and behavioral diversification of teleost fish, 
perhaps facilitating the radiation of this speciose group into 
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diverse marine and freshwater environments across the 
planet. Evidence in support of a causal connection between 
the TCD and phenotypic innovation is provided by studies of 
TGD-derived gene duplicates that evolved distinct physio- 
logical or developmental functions in various teleost lineages 
(Meyer and Malaga-Trillo 1999; Lister et al. 2001; Mulley et al 
2006; Braasch et al. 2006, 2007; Hashiguchi and Nishida 2007; 
Hoegg and Meyer 2007; Sato and Nishida 2007; Siegel et al 
2007; Yu et al. 2007; Douard et al. 2008; Braasch, Brunet, et al 
2009; Braasch, Volff, et al. 2009; Sato et al. 2009a, 2009b; 
Arnegard et al. 2010). 

The extreme physiological versatility of teleost fishes is 
manifest in their diversity of blood-gas transport traits 
(Wells 2009). This diversity reflects the myriad solutions 
that have evolved to maintain tissue O2 delivery in the 
face of changing metabolic demands and environmental 
O2 availability during different ontogenetic stages. Relative 
to air-breathing vertebrates, fish generally contend with 
far greater vicissitudes of environmental O2 availability, 
which is largely because O2 solubility (and hence, the 
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availability of dissolved O2 for respiration) varies as a func- 
tion of water temperature. During ontogeny, regulatory 
changes in blood-02 transport are mediated by the expres- 
sion of multiple, functionally distinct hemoglobin (Hb) 
isoforms that are adapted to the particular 02-transport 
challenges encountered by the developing embryo or fetus 
(in viviparous or oviparous species) and in free-swimming 
larvae and adults (reviewed by Ingermann 1997; Jensen 
et al. 1998). As in other vertebrates, the developmental 
regulation of Hb synthesis in fish involves differential expres- 
sion of duplicated genes that encode the a- and (B-chain 
subunits of distinct tetrameric MjPj Hb isoforms (Chan 
et al. 1997; Brownlie et al. 2003; Maruyama, Yasumasu, and 
luchi 2004; Maruyama, Yasumasu, Naruse, et al. 2004; Tiedke 
etal. 2011). 

Most teleost fish also coexpress functionally distinct Hb 
isoforms during posthatching life, and these isoforms can be 
broadly classified (based on electrophoretic mobility at 
pH > 8.0) as "anodic" or "cathodic." The anodic Hbs have 
relatively low O2 affinities and a pronounced Bohr effect 
(decreased Hb-02 affinity at low pH), whereas the cathodic 
Hbs have relatively high O2 affinities, an enhanced respon- 
siveness to allosteric regulation by organic phosphates, and a 
reversed Bohr effect (increased Hb-02 affinity at low pH) in 
the absence of organic phosphates (Weber and Jensen 1988; 
Weber 1990, 2000; Jensen et al. 1998; Weber et al. 2000; Wells 
2009). Experimental evidence for some species suggests that 
regulatory changes in intraerythrocytic Hb isoform compos- 
ition may play a role in the acclimatization response to en- 
vironmental hypoxia (e.g., Rutjes et al. 2007), but it has not 
been possible to formulate any broadly consistent empirical 
generalizations (Weber and Jensen 1988; Weber, 1990, 2000; 
Ingermann 1997; Wells 2009). A remarkable feature of nearly 
all anodic Hb isoforms of teleost fish is the Root effect, an 
extreme reduction in Hb-02 binding capacity at low pH, even 
when blood O2 tension remains high. The Root effect is con- 
sidered a key evolutionary innovation in teleost fish, as it plays 
a critical role in secreting O2 into the swim bladder for buoy- 
ancy control and in supplying O2 to the avascular retina 
(Pelster and Weber 1991; Berenbrink et al. 2005; Berenbrink 
2007; Wells 2009). 

The proto a- and /J-globin genes of jawed vertebrates 
(Gnathostomata) represent the product of an ancient gene 
duplication event that occurred roughly 450-500 Ma in the 
Ordovician, before the divergence between cartilaginous fish 
(Chondrichthyes) and the common ancestor of ray-finned 
fish (Actinopterygii) and lobe-finned fish -I- tetrapods 
(Sarcopterygii; Goodman et al. 1987; Storz et al. 2011, 2012; 
Hoffmann, Opazo, and Storz 2012). Subsequent rounds of 
duplication and divergence gave rise to diverse repertoires 
of a- and y6-like globin genes that are developmentally regu- 
lated in different ways in different vertebrate lineages 
(Hardison 2001; Hoffmann, Storz, et al. 2010). The ancestral 
linkage arrangement of the a- and y6-globin genes is still re- 
tained in at least some cartilaginous fish (Marino et al. 2007), 
teleosts (Chan et al. 1997; Miyata and Aoki 1997; Gillemans 
et al. 2003; Pisano et al. 2003), and amphibians (Hentschel 
et al. 1979; Jeffreys et al. 1980; Kay et al. 1980; Hosbach et al. 



1983; Fuchs et al. 2006). In amniote vertebrates, by contrast, 
the a- and /J-globin gene clusters are located on different 
chromosomes due to transposition of the proto /J-globin 
gene to a new genomic location sometime after the stem 
lineage of amniotes split from the line leading to amphibians 
(Hardison 2008; Patel et al. 2008, 2010; Hoffmann, Opazo, and 
Storz 2012). 

The main objective of the present study was to assess 
the relative contributions of whole-genome duplication, 
large-scale segmental duplication, and small-scale gene dupli- 
cation in producing the extraordinary functional diversity of 
teleost Hbs. To accomplish this, we integrated phylogenetic 
reconstructions with analyses of conserved synteny to char- 
acterize the genomic organization and evolutionary history of 
the globin gene clusters of teleost fish. These results were then 
integrated with available experimental data on functional 
properties and developmental patterns of stage-specific 
gene expression. Results of the phylogenetic and comparative 
genomic analyses revealed repeated evolutionary transitions 
in stage-specific expression in different teleost lineages. 
Our analyses also revealed that functionally distinct anodic 
and cathodic adult Hbs evolved independently in different 
teleost lineages, providing evidence for convergence in 
the physiological division of labor between coexpressed Hb 
isoforms. 

Materials and Methods 

Data Collection 

We used bioinformatic techniques to manually annotate 
the full complement of globin genes in the genomes of six 
teleost fish available in release 67 of the ensembi database 
(fugu, Takifugu rubripes; medaka, Oryzias latipes; green 
spotted puffer, Tetmodon nigroviridis; tilapia, Oreochromis 
niloticus; three-spined stickleback, Casterosteus aculeatus; 
and zebrafish, Danio rer'io). We also annotated the globin 
genes from a live-bearing teleost (platyfish, Xiphophorus 
maculatus) and a nonteleost ray-finned fish (spotted gar, 
Lepisosteus oculatus), both available from the PrelensembI 
database. We compared the ensembi data with previous re- 
ports on the genomic organization of the globin gene clusters 
in fugu (Flint et al. 2001), medaka (Maruyama, Yasumasu, and 
luchi 2004; Maruyama, Yasumasu, Naruse, et al. 2004), and 
zebrafish (Brownlie et al. 2003). We also included coding se- 
quences from the full complement of globin genes from 
Atlantic cod {Cadus morhua; Borza et al. 2009, Wetten 
et al. 2010) and Atlantic salmon {Salmo salar, Quinn et al. 
2010). However, the fragmentary state of the cod and salmon 
genome assemblies precluded a detailed comparative analysis 
of the globin gene clusters in these two species. Finally, we 
included additional records from tetrapod vertebrates and 
cartilaginous fish as outgroup sequences for phylogenetic 
analyses, and we included genomic contigs from representa- 
tive tetrapods for the purpose of making synteny compari- 
sons. When possible, the annotated genomic sequences were 
validated by comparison with the relevant expressed se- 
quence tag (EST) databases. 
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Assessments of Conserved Synteny 
To examine patterns of conserved synteny, we annotated the 
genes found upstream and downstream of the globin gene 
clusters of seven teleost species (fugu, medaka, platyfish, 
green-spotted puffer, stickleback, tilapia, and zebrafish) and 
one nonteleost ray-finned fish (spotted gar). Initial ortholog 
predictions were derived from the EnsemblCompara data- 
base (Vilella et al. 2009) and were visualized using the program 
Genomicus (Muffato et al. 2010). In addition, we also used the 
program Censcan (Burge and Karlin 1997) to identify add- 
itional unannotated genes lying upstream and downstream of 
the annotated globin genes. The unannotated genes were 
compared with the nonredundant protein database using 
Basic Local Alignment Search Tool (BLAST) (Altschul et al. 
1990). Partial sequences for genes of interest (representing 
pseudogenes or artifacts related to incomplete sequence 
coverage) were identified and annotated with BLAST. To 
examine large-scale patterns of sequence conservation, we 
conducted pairwise comparisons of sequence similarity be- 
tween globin gene clusters using the Pipmaker and 
Multipipmaker programs (Schwartz et al. 2000, 2003). To fa- 
cilitate comparisons, genes have been labeled following the 
Zebrafish Model Organism Database nomenclature guide- 
lines. Finally, we conducted an analysis of conserved synteny 
between the globin gene clusters of medaka and the recon- 
structed protokaryotype of the pre-TGD teleost common 
ancestor provided by Kasahara et al. (2007) and Nakatani 
et al. (2007). 

Sequence Alignment 

Separate alignments of the a- and /5-globin coding sequences 
were based on conceptual translations of nucleotide se- 
quences. Alignments were performed using Muscle v 3.8 
(Edgar 2004) and the E-INS-i, G-INS-I, and L-INS-i strategies 
from Mafft v6.8 (Katoh et al. 2009). We employed MUMSA 
(Lassmann and Sonnhammer 2005, 2006) to select the 
best-scoring multiple alignment, and we then used the se- 
lected alignment to estimate phylogenetic relationships. 
These sequence manipulations were carried out in the 
Mobyle platform server (Neron et al. 2009) hosted by 
the Institut Pasteur (http://mobyle.pasteur.fr, last accessed 
September 2012). All sequence alignments are provided in 
supplementary data file SI, Supplementary Material online. 

Phylogenetic Analyses 

We reconstructed separate phylogenies for the a- and 
/6-globin genes using Bayesian and maximum likelihood 
approaches. We performed maximum likelihood analyses in 
Treefinder, version March 2011 (Jobb et al. 2004), and we 
evaluated support for the nodes with 1,000 bootstrap pseu- 
doreplicates. We used the "propose model" tool of Treefinder 
to select the best-fitting models of amino acid and nucleotide 
substitution, with an independent model for each codon 
position in analyses based on nucleotide sequences. Model 
selection was based on the Akaike information criterion with 
correction for small sample size. We estimated Bayesian phy- 
logenies in MrBayes v.3.1.2 (Ronquist and Huelsenbeck 2003), 



running six simultaneous chains for 2x10^ generations, sam- 
pling every 2.5 x 10^ generations, and using default priors. A 
given run was considered to have reached convergence once 
the likelihood scores reached an asymptotic value and the 
average standard deviation of split frequencies remained 
<0.01. We discarded all trees that were sampled before con- 
vergence, and we evaluated support for the nodes and par- 
ameter estimates from a majority rule consensus of the last 
2,500 trees. 

Results and Discussion 

The comparative genomic analysis revealed that the Hb genes 
of teleost fish are located in two separate chromosomal re- 
gions that are clearly delineated by distinct sets of flanking loci 
(fig. 1). In contrast, the Hb genes of the nonteleost gar are 
located in a single chromosomal region. Following Hardison 
(2008), the teleost globin gene cluster flanked by the mpg and 
nprl3 genes was labeled the "MN" cluster, and the teleost 
globin cluster flanked by the Icmtl and aqp8 genes was 
labeled the "LA" cluster In the platyfish assembly, we identi- 
fied two separate scaffolds containing the MN and LA clusters 
(fig. 1) and a third scaffold (JH559524) that contained a single, 
putatively funcdonal /6-globin gene. We excluded this latter 
scaffold from all subsequent analyses because it likely repre- 
sents an assembly artifact. The MN and LA clusters corres- 
pond to the medaka El and Al clusters, respectively, that 
were described by Maruyama et al. (Maruyama, Yasumasu, 
and luchi 2004; Maruyama, Yasumasu, Naruse, et al. 2004). 
To facilitate comparison, we report the order of genes in the 
same orientation as they appear in the zebrafish genome 
assembly, regardless of how they are found in the ensembl 
database. Since the MN and LA clusters of most teleosts 
harbor globin genes in both forward and reverse orientations, 
we use the terms left and right to describe linear gene order. 
The individual a- and /J-globin genes in the MN cluster were 
numbered from left to right, such that the functional globin 
gene in the leftmost position of the MN cluster of zebrafish is 
labeled MN HbbT, the next gene to the right is MN Hbal, and 
so forth, whereas the genes in the LA cluster were numbered 
from right to left, starting with the gene closest to aqp8 
(fig. 2). In the case of cod and salmon globin genes, we re- 
tained the labels fi-om the original studies (Borza et al. 2009; 
Quinn et al. 2010). Sequence sources for the globin gene 
clusters used in this study are provided in table 1, and the 
annotations for each cluster are provided in supplementary 
table SI, Supplementary Material online. To facilitate com- 
parisons with previous studies, we compiled a list of previ- 
ously used names for each of the annotated globin genes 
(supplementary table S2, Supplementary Material online). 

Genomic Structure of the MN and LA Globin Gene 
Clusters in Teleosts 

Patterns of Conserved Synteny 

The genomic context of the teleost globin gene clusters is 
relatively well conserved, especially in the case of the MN 
cluster. In all teleost species analyzed, there is perfect conser- 
vation for the five genes to the left of the MN cluster aanat, 
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Fig. 1. Unsealed depiction of the genomic organization of the MN and LA globin gene clusters from representative teleost fishes, with the human 
a-globin gene cluster provided as reference. To facilitate comparisons, all clusters are presented in the same orientation as the zebrafish. Genes in the 
forward orientation are shown on top of the chromosome, whereas genes in the reverse orientation are shown below. 



LA cluster 



MN cluster 



I — Fugu 



TGD 



rl- 



Green spotted 
puffer 



Stickleback 



Platyfish 



Medaka 



Tilapia 



Zebrafish 



Gar 



Hba2 Hba1 
Hbbi 

Hba2 Hbal 
Hba2 Hba1 



HbbI 
Hba2 Hba1 



Hbb1 
Hba2 Hba1 

■ , ' 

Hbb1 

Hba2 Ht 

^ H 

Hbb2 Hbbi 
•Hba1 Hba2 Hba3 



Hba1 Hba2 

Hbb1 Hbb2 Hbb3 Hbb4 Hbb5 Hbb6 

■■■ ■■■■ ■■■■ 

Hbal Hba2 Hba3 Hba4 Hba5 
Hbal Hbb2 Hba4 



I a-globin 
I P-globin 
[] [] pseudogenes 

Early expression 

Late expression 

Early and late expression 

Undetermined expression 



Hbb1 Hba2 Hba3'Hbb3 Hbb4 
Hba-ps Hbb1 Hba2 

n _■ I 



Hbab Mbao 
Hbb4 



OI<b 
h- 



10 kb 
—I 



Hbb-ps Hbal Hbb2 
•Hbal Hba2-ps Hbb3 

_i n _ ■ _ 



Hba3Hbb3Hba4 Hba5 
Hbb4 



Hbbi •Hbb2 Hba3 Hba4 

Hba1 Hba2 Hba3 



Hba5 Hba6 HbbS Hbb6 -HbhT 
Hbb4 HbbS Hbb6 



Hbb-ps 



Hbb2 HbbS 

Hbb4 



HbbS 



T 



Hbbi Hba4 Hbb2 HbaS •Hbb3 



Fig. 2. Genomic structure of the MN and LA globin gene clusters of teleost fish. To facilitate comparisons, all clusters are presented in the same 
orientation as the zebrafish. Genes in the forward orientation are shown on top of the chromosome, whereas genes in the reverse orientation are shown 
below. The green-spotted puffer globin genes are assumed to have the same stage-specific expression profiles as their orthologous counterparts in fugu. 
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marked with an asterisk were not included in the phylogenetic analyses. 
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Table 1. Data Sources, Genomic Coordinates and Orientations of the Clobin Gene Clusters in Fugu, Green-Spotted Puffer, Gar, Medaka, Platyfish, 
Tilapia, Stickleback, and Zebrafish. 



Species 


Release 


Cluster 


Location 


Orientation 


Start 


End 


Fugu (7. rubripes) 


Fugu 4.0 


LA 


Sc 3 


Icmtl 


rhbdfib 


2,511,982 


2,517,737 






MN 


Sc 15 


kank 


nprl3 


417,195 


420,598 


Green-spotted puffer (Tet. nigroviridis) 


46 


LA 


Chr 2 


rhbdfib 


aqp8 


5,887,638 


5,893,221 






MN 


Chr 3 


kank 


nprl3 


12,162,093 


12,165,924 


Gar (Lepisosteus oculatus) 


LepOcul 


Hb 


LG13 


nprl3 


Iuc7l 


2,809 


54,885 


Medaka (0. latipes) 


Medaka 1.0 


LA 


Chr 19 


— ^ 


aqp8 


1,478,030 


1,487,664 






MN 


Chr 8 


nprl3 


kank 


8,378,078 


8,412,019 


Platyfish (X. maculatus) 


Xipmac4.4.2 


LA 


JH557783 


rhbdfib 


aqp8 


22,438 


37,618 






MN 


JH556906 


nprl3 


kank 


106,543 


141,798 






Unassigned 


JH559524 






5,235 


8,503 


Tilapia (Ore. niloticus) 


Orenih.O 


LA 


GL831136 


rhbdfib 


aqp8 


111,303 


122,995 






MN 


CL831149 


nprl3 


kank 


110,462 


169,554 


Stickleback (C. aculeatus) 


BROADS1 


LA 


Sc 112 


c17orf28 


aqp8 


339,530 


343,463 






MN 


Cr XI 


kank 


nprl3 


13,640,461 


13,663,356 


Zebrafish (D. rerio) 


Zv9 


LA 


Chr 12 


rhbdfib 


aqp8 


21,688,806 


21,705,956 






MN 


Chr 3 


nprl3 


kank 


55,938,147 


55,999,373 



Note. — in all cases, data were obtained from Ensembl. The start and end points correspond to ttie most distant edges from tiie two genes on either end of the cluster 



rngmh rhbdfla, mpg and nprl3 (fig. 1). The two genes to the 
right, kankl and dockG, are also conserved in all species. The 
genomic organization of the LA globin gene cluster is not as 
strongly conserved. Four of the seven teleost species possess a 
single copy of rhbdfib to the left of the LA cluster, which is 
paralogous to the rhbdfla gene found adjacent to the MN 
cluster. On the right side of the LA cluster, all teleost species 
possess copies of Icmtl and arhgapU. Each of the teleost 
species possess one or two copies of aqp8, with the exception 
of the two tetraodontid species (fugu and green spotted 
puffer) that have secondarily lost this gene (fig. 1 ). 

Hb Gene Repertoires of Teleost Fish 

There were several cases where our manual annotations of 
the globin gene clusters differed from annotations provided in 
the most recent releases of the various teleost genome assem- 
blies. For example, no MN-linked globin genes were anno- 
tated in the most recent release of the fugu genome in the 
ensembl database. However, BLAST comparisons with an in- 
dependent record of the fugu MN globin cluster (AY016024) 
revealed the presence of two unannotated a-globin genes 
between nprl3 and kankl, as reported by Flint et al. (2001). 
In addition, the only annotated /J-globin gene in the green- 
spotted puffer genome (green-spotted puffer LA Hbbl) con- 
tained a 4 bp insertion in the second exon that would render 
it nonfunctional. Comparisons with cDNA-derived sequence 
databases revealed several putatively functional transcripts 
that lacked the inactivating 4 bp insertion but were otherwise 
identical in sequence. We assumed that the insertion was 
either a sequencing or assembly artifact, and we therefore 
used the cDNA-derived sequence for all further analyses. 

The MN and LA clusters of the different species exhibited 
substantial variation in both physical extent and gene content 
(fig. 2). From the start codon of the first globin gene to the 
stop codon of the last globin gene, the MN cluster ranged 
from 3.4 kb in fugu to 68.5 kb in zebrafish, and the LA cluster 
ranged from 3.4 kb in stickleback to 17.2 kb in zebrafish. 
With respect to gene content, the number of globin genes 



in these clusters ranged from 2 in the MN clusters of fugu and 
green-spotted puffer and the LA cluster of stickleback, to 1 3 in 
the MN clusters of tilapia and zebrafish (not including two 
genes with partial sequence coverage in the tilapia assembly; 
fig. 2). Interspecific comparisons revealed a higher rate of 
globin gene turnover in the MN cluster than in the LA cluster. 
The MN clusters of fugu and green-spotted puffer possess only 
two cf-globin genes in the reverse orientation, whereas the 
MN clusters of all other teleosts contain interspersed a- and 
/J-globin genes in both head-to-head and head-to-tail orien- 
tations (fig. 2). In the case of stickleback, all of the a-globin 
genes are found in the reverse orientation, and all of the 
/J-globin genes are in the forward orientation (fig. 2). In all 
other teleosts, in contrast, multiple a- and /6-globin genes are 
found in both forward and reverse orientations. In all species 
examined, the LA cluster harbors two tandemly duplicated 
Q!-globin genes, and when present, the /6-globin genes are 
sandwiched in between the a-globin genes but in the opposite 
orientation. The comparative genomic analysis revealed that 
the /J-globin genes of stickleback are only present in the MN 
cluster, whereas the single /J-globin genes of fugu and green- 
spotted puffer are only present in the LA cluster. Thus, the 
/J-globin genes of stickleback and the two tetraodontid spe- 
cies are not 1:1 orthologs. Furthermore, the set of three globin 
genes in the LA clusters of fugu and green-spotted puffer 
appear to have been inverted relative to those of medaka, 
stickleback, and zebrafish. This inversion hypothesis predicts 
that the LA Hbal genes from fugu and green-spotted puffer 
should be most closely related to the LA Hbal genes of 
medaka, platyfish, stickleback, tilapia, and zebrafish. 

The Globin Gene Cluster in Gar and the Origin of the MN and 
LA Clusters of Teleosts 

In most cases, orthologs of genes flanking the MN and LA 
globin clusters in teleost fish are located in the vicinity of the 
a-globin gene cluster in human and chicken, which appears 
to represent the ancestral location of the proto-Hb gene in 
jawed vertebrates (Hoffmann, Opazo, and Storz 2012). The 2:1 
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pattern of conserved synteny between teleost fish and tetra- 
pods suggests that the MN and LA globin clusters of teleost 
fish derive from the TGD, as suggested by Quinn et al. (2010). 
This inference is also supported by the presence of duplicate 
copies of rhbdfl in teleosts, which are co-orthologs of the 
single-copy rhbdfl in tetrapods. Additional bioinformatic 
searches in the vicinity of the globin gene clusters revealed 
that most teleosts also possess duplicate copies of sh/sa9 and 
mlk2, one on the LA cluster and one on the MN cluster, that 
are co-orthologous to single-copy genes on the same chromo- 
some as the a-globin gene cluster in human and chicken. 

Two additional lines of evidence support the hypothesis 
that the LA and MN clusters represent paralogous products 
of the TGD. First, we tested the prediction that the spotted 
gar (a nonteleost ray-finned fish) would possess a single 
globin gene cluster, since the gar and teleost lineages diverged 
before TGD. Consistent with this prediction, our comparative 
genomic analysis revealed that the spotted gar does indeed 
possess a single globin gene cluster, ~52 kb in length, that 
contains 5 a- and 5/6-globin genes in both forward and re- 
verse orientations (fig. 2). The cluster is flanked by copies of 
cl6or33, polr3k, mgmljoxlj, aanat, rhbdfl, mpg, and nprl3 on 
the left, and by copies of Iuc7l and itfg3 on the right (fig. 1). 
Second, we tested the prediction that the LA and MN gene 
clusters of teleosts descend from the same linkage group in 
the reconstructed protokaryotype of the pre-TGD teleost 
ancestor. Consistent with this prediction, an analysis of con- 
served synteny revealed that the MN and LA clusters of 
medaka are embedded in paralogous chromosomal segments 
that trace their duplicative origin to chromosome "e" in the 
pre-TGD teleost protokaryotype inferred by Kasahara et al. 
(2007) and Nakatani et al. (2007). 

Phylogenetic Relationships among Teleost a- and 
/^-Globin Genes 

After characterizing the genomic organization of the globin 
gene clusters in spotted gar and the seven teleost fish, we 
performed phylogenetic analyses to reconstruct the duplica- 
tive history of the a- and /6-globin genes. For this analysis, we 
added the globin gene repertoires of cod and salmon to those 
of fugu, green-spotted puffer, medaka, platyfish, stickleback, 
tilapia, and zebrafish, and we also included sequences from 
representative tetrapods and cartilaginous fish for compara- 
tive purposes. All of the different alignment strategies pro- 
duced very similar results for the a- and /J-globin data sets, 
and in both cases, we selected the L-INS-i alignment for use in 
the phylogenetic reconstructions because it had the highest 
MUMSA score. Before estimating phylogenies, we selected 
the best-fitting models of amino acid and nucleotide substi- 
tution based on the Akaike information criterion with cor- 
rection for small sample size. In analyses based on nucleotide 
sequences, we selected an independent model for each codon 
position. Results of the model estimation procedure can be 
found in supplementary table S3, Supplementary Material 
online. 

The estimated phylogenies of vertebrate globin sequences 
suggested that neither a- or /^-globin genes of ray-finned fish 



are monophyletic relative to their tetrapod counterparts 
(fig. 3A and B). In the case of ot-globin genes, a clade of fish 
sequences that included a subset of genes derived from the 
teleost LA cluster (LA Hba clade 1 -I- gar Hba3) were placed 
sister to the chicken a^-globin gene, whereas all other fish 
Q!-globins were placed in a second monophyletic group 
(fig. 3A). In the case of the /J-like globin genes, a clade of 
two gar sequences, including gar Hbb4 and HbbS, was 
placed sister to the chicken /6-globins. These arrangements 
suggest that multiple a- and /J-globin genes were present in 
the common ancestor of Actinopterygii -I- Sarcopterygii. 

The phylogeny shown in figure 3A revealed that fish 
a-globins can be arranged into two distinct clades, defined 
by the presence of gar Hba2 and gar Hba3, respectively. In 
turn, teleost a-globins were arranged into five clades that 
(with the exception of the cod LA Hba2 sequence) reflect 
their cluster of origin. The discordant position of the cod LA 
Hba2 sequence probably represents an assembly artifact. 
Aside from this cod sequence, all a-globin genes derived 
from the LA cluster were grouped into two strongly sup- 
ported clades: LA Hba clade 1 is sister to gar Hba3, and LA 
Hba clade 2 is embedded in a strongly supported clade that 
includes all MN a-globin sequences in addition to LA Hba2 
from cod and Hba2, Hba4, and HbaS from spotted gar 
(fig. 3A). Genealogical relationships within these two clades 
of LA a-globins are largely congruent with the known organ- 
ismal relationships, and in both cases, the deepest split sepa- 
rated the zebrafish genes from those of the remaining 
euteleost taxa. As expected under the cluster-inversion hy- 
pothesis, the leftmost a-globin genes in the LA cluster of fugu 
and green-spotted puffer are most closely related to the right- 
most Q!-globin genes of medaka, platyfish, stickleback, tilapia, 
and zebrafish, and vice versa (fig. 3A). Relationships among 
the a-globin sequences in the MN cluster are more complex 
and are not easily reconciled with the organismal phylogeny. 
The MN-linked genes are organized into three weakly sup- 
ported clades (fig. 3A). MN Hba clade 1 contains salmon and 
platyfish sequences in addition to two gar sequences, whereas 
MN Hba clade 2 contains zebrafish and salmon sequences. 
MN Hba clade 3 was placed sister to LA Hba clade 2 and 
includes sequences from all teleosts in addition to cod LA 
Hba2. All species examined possess an a-globin gene reper- 
toire that includes representatives of at least three of the five 
clades, and zebrafish possesses a-globin genes that are repre- 
sented in four of the five clades. 

In contrast to the a-globin genes, all teleost /6-globin genes 
were placed in a moderately well-supported clade, which was 
placed sister to a clade of two gar Hbb sequences {Hbbi and 
Hbb2). The other two gar Hbb sequences were placed sister to 
chicken Hbbs (fig. 3B). The /6-globin genes could be arranged 
into four separate clades, three of which were strongly sup- 
ported, with sequences from the MN cluster forming a para- 
phyletic group relative to those from the LA cluster. The 
/6-globins from the LA cluster were placed in a monophyletic 
group, while those from the MN cluster can be grouped into 
three separate clades, with the exception of Cod MN Hbb7, 
which is distantly related to the rest. MN Hbb clade 1 contains 
sequences from medaka, salmon, tilapia, and zebrafish; MN 
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Fig. 3. Maximum likelihood phylogram depicting relationships among the globin sequences of seven representative teleost fishes. Phylogenetic 
reconstructions were based on the coding sequences of a- and /J-globin genes (panels A and B, respectively). Cartilaginous fish globins were used 
as outgroup sequences, and tetrapod sequences were included for comparative purposes. Values on the nodes denote bootstrap support values (above) 
and Bayesian posterior probabilities (below). Branches are color coded according to the location of the genes: MN-linked genes are shown in blue, 
LA-linked genes are shown in orange, and the gar genes are in green. Labels are color coded based on the timing of their expression. The substitution 
models selected are listed in supplementary table S3, Supplementary Material online. 



Hbb clade 2 contains sequences from platyfish, salmon, til- 
apia, and zebrafish; and MN Hbb clade 3 contains sequences 
from cod, medaka, platyfish, salmon, stickleback, tilapia, and 
zebrafish (fig. 3B). Within each of these clades, paralogs from 
the same species almost invariably formed monophyletic 
groups, which likely reflects a history of lineage-specific 
duplication, as with the Hba genes of the MN cluster. This 
is particularly clear in the case of MN Hbb clade 3, where 
relationships among the different paralogs are congruent with 
the known organismal phylogeny after accounting for 
lineage-specific duplications. 

With the exception of /6-globin genes from the LA cluster, 
globin genes of the same subunit type from the MN or LA 
clusters did not form monophyletic groups. Taken together, 
the analyses of conserved synteny (figs. 1 and 2) and the 
phylogenetic reconstructions (fig. 3) indicate that the 
pre-TGD globin gene cluster of teleost fish contained at 
least two a-globin genes and 2 /6-globin genes. Further, the 
positions of the gar sequences in the phylogenies of a- and 



/J-like globins indicate that multiple globins of each subunit 
type were present in the common ancestor of gar and tele- 
osts. If further analyses confirm the paraphyly of ray-finned 
fish a- and /6-globins relative to their tetrapod homologs, it 
would indicate that multiple a- and /J-globin genes were 
present in the common ancestor of Actinopterygii and 
Sarcopterygii. As for teleosts, after the TCD but before diver- 
gence between zebrafish and the remaining euteleost species, 
one of the two ancestral /J-globin paralogs in the LA cluster 
was secondarily lost such that the post-TGD globin repertoire 
was reduced from 8 to 7 genes (fig. 4). Similar lineage-specific 
patterns of gene turnover have been documented in the 
a- and /J-globin gene clusters of mammals and other verte- 
brates (Hoffmann et al. 2008a, 2008b; Opazo et al. 2008a, 
2008b, 2009; Hoffmann, Storz, et al. 2010). On a deeper evo- 
lutionary timescale, lineage-specific duplications and dele- 
tions have produced extensive variation in the size and 
membership composition of the globin gene superfamily 
among different vertebrate classes and among different 
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Fig. 4. Evolutionary model describing the duplicative origins of the LA and MN globin gene clusters of teleost fish and the inferred globin gene 
repertoire in the common ancestor of teleosts and gar, a nonteleost ray finned fish. All clusters depicted are hypothetical with the exception of the gar 
cluster. The order of the a- and /S-globin genes on the hypothetical clusters is arbitrary. 



deuterostome phyla and subphyla (Ebner et ai. 2010; Hoff- 
mann, Storz, et ai. 2010; Hoffmann et al. 201 1; Storz et al. 201 1; 
Hoffmann, Opazo, Hoogewijs, et al. 2012; Hoogewijs et al. 
2012). 

The phylogenies in figure 3 indicate that all salmon a- and 
/6-globin genes are exclusively found in association with 
MN-linked globin genes from other species. This reflects the 
fact that salmonid fish have experienced an additional 
lineage-specific genome-duplication and that all globin 
genes were deleted from the duplicated LA clusters and 
were retained exclusively in the duplicated MN clusters 
(Quinn et al. 2010). With the exception of fugu and green- 
spotted puffer, which possess identical globin gene reper- 
toires, all other species in our study show evidence of 
lineage-specific duplications, which are much more frequent 
in the MN cluster. In fact, aside from fugu and green-spotted 
puffer, all other species have expanded the repertoire of 
a- and /J-globin genes via lineage-specific duplications. The 
most striking contrast is between the MN a-globins from 
platyfish, tilapia, and zebrafish, and the MN /J-globins from 
stickleback. The stickleback /^-globins in the MN cluster derive 
from a recent set of duplications, whereas the a-globins from 
the MN clusters of platyfish, tilapia, and zebrafish derive from 
a combination of recent, lineage-specific duplications of genes 
deriving from more ancient duplications that likely occurred 
before the TGD. 

In addidon to the differences in timing, these 
lineage-specific duplications also appear to involve different 
mechanisms. In many instances, the expansions derive from 
single gene duplications, such as the one giving rise to the 



duplicate Hbb paralogs in the zebrafish LA cluster. On the 
other hand, the structure of the MN clusters of medaka, 
stickleback, and zebrafish suggest that en bloc duplications 
are partly responsible for their lineage-specific expansions in 
gene family size. In the case of the stickleback, the presence of 
extensive internal colinearity within the MN cluster suggests 
that it expanded by en bloc duplications involving either 
the Hbb-Hba pair or an Hbb-Hba-Hbb-Hba four-gene 
set (fig. 5). The same can be said for the zebrafish MN 
Hba2-Hbb2 gene pair and the MN Hba3-Hbb3 gene pairs 
in zebrafish (supplementary fig. SI, Supplementary Material 
online). However, comparisons of zebrafish MN Hba4-6 and 
MN Hbb4-6 gene pairs revealed low levels of sequence simi- 
larity in flanking regions (supplementary fig. SI, Supplemen- 
tary Material online). 

Repeated Evolutionary Transitions in Functional 
Properties and Stage-Specific Expression 
In light of evidence that the developmental regulation of Hb 
synthesis has evolved independently in multiple tetrapod lin- 
eages (Hoff'mann, Storz, et al. 2010; Storz et al. 201 1, 2012), we 
tested for evidence of a similar phenomenon in teleosts by 
reconstructing phylogenetic relationships among a- and 
P-Wke globin genes that are differentially expressed during 
development. For the purposes of this analysis, globin genes 
were classified as "early-expressed" if they are preferentially 
expressed during embryonic or larval developmental stages, 
whereas genes were classified as "late-expressed" if they are 
preferentially expressed in juveniles or adults (supplementary 
table S4, Supplementary Material online, fig. 2). Since fugu and 
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Fig. 5. Dot plots of intrachromosomal sequence similarity in the MN globin gene clusters of medaka and stickleback. The fragment includes all genes in 
the clusters in addition to 5 kb of flanking sequence. The diagonal self-identity plot is shown in gray, as are the low-complexity areas in the medaka 
cluster. Note that the intragenomic dot plot for the stickleback gene cluster shows longer tracts of internal similarity off the self-identity diagonal relative 
to that for the medaka gene cluster, shown in black. 



green-spotted puffer possess a single /J-globin gene, we 
assumed that this gene is expressed during all ontogenetic 
stages. For comparative purposes, we included additional tele- 
ost globins that are known to be preferentially expressed 
during embryogenesis in channel catfish {Ictalurus punctatus; 
Chen et al. 2010), rainbow trout {Oncorhynchus mykiss; Mar- 
uyama et al. 1999), and salmon (Leong et al. 2010). We also 
analyzed late-expressed a- and /J-globin genes whose prod- 
ucts are incorporated into tetrameric Hbs with highly distinct 
functional properties, such as the well-characterized anodic 
and cathodic Hbs of the European eel {Anguilla anguilla; Fago 
et al. 1995, 1997) and dusky notothen {Trematomus newnesi; 
Mazzarella et al. 1999). Expression data for cod, medaka, and 
zebrafish were obtained from the literature. The cod se- 
quences were classified following Wetten et al. (2010), the 
medaka sequences were classified following Maruyama, 
Yasumasu and luchi (2004), and the zebrafish sequences 
were classified following Tiedke et al. (2011). For globin 
genes in fugu, gar, green-spotted puffer, platyfish, salmon, 
and tilapia, we inferred the timing of expression by identifying 
matches with sequences in EST databases (supplementary 
table S4, Supplementary Material online). In the cases of se- 
quences with no matches from the same species as the query 
sequence or lack of developmental information for the EST 
matches, the sequences were left as unclassified. 

Intriguingly, results of our analyses revealed repeated 
evolutionary transitions in stage-specific expression during 
development. In some cases, paralogous genes in different 
species evolved convergent expression patterns, and in 
other cases, orthologous genes evolved divergent expression 
patterns. In the case of the a-like globin genes, LA Hba clades 
1 and 2 provide clear examples of probable 1:1 orthologs that 
evolved differences in stage-specific expression (e.g., the 



early-expressed zebrafish LA Hbal and the late-expressed 
medaka LA Hbal; fig. 3A). In the case of fi-Wke globin genes, 
LA Hbb clade 1 illustrates a similar pattern of replicated ex- 
pression divergence (e.g., the early expressed zebrafish LA 
Hbbl and Hbb2 genes are clearly co-orthologous to the 
adult-expressed medaka LA HbbV, fig. SB). These results dem- 
onstrate that the developmental timing of globin gene 
expression is evolutionarily labile. 

In the a- and /J-globin gene clusters of most amniotes, the 
linear order of the genes reflects their temporal order of ex- 
pression during development, with early-expressed genes at 
the 5' end of the cluster and late-expressed genes at the 3' end 
of the cluster (Hardison 2001). In the globin gene clusters of 
teleosts, in contrast, linear gene order is not as strong a pre- 
dictor of stage-specific expression. In the case of the zebrafish 
MN cluster, all late-expressed genes are on the left and all 
early-expressed genes are on the right, whereas in medaka, all 
genes on the left side are early-expressed and the genes on the 
right are variable with respect to the developmental timing of 
expression, and in tilapia, the early- and late-expressed genes 
are interspersed. Our results indicate that the genes in the LA 
cluster provide the clearest evidence of lineage-specific 
changes in gene expression. 

Since embryonic/fetal Hbs and adult-expressed Hbs 
exhibit consistent differences in 02-affinity and sensitivity to 
allosteric regulators (Ingermann 1997), convergence in stage- 
specific expression also likely entailed convergence in func- 
tional properties. Similarly, adult a- and /J-globin genes that 
encode the subunits of cathodic Hbs of European eel and 
dusky notothen are clearly not 1:1 orthologs (fig. 6), indicating 
that specialized Hbs with similar functional properties evolved 
independently in different teleost lineages. In fact, the dusky 
notothen cathodic Hba is closely related to sequences in the 
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Fig. 6. Maximum likelihood phylogram depicting relationships among the globin genes of the seven fish species for which full genome sequence data 
were available, plus sequences of functionally annotated globins from other teleost species. The phylogenetic reconstructions were based on the amino 
acid sequences of a- and /J-globins (panels A and B, respectively). Cartilaginous fish globins were used as outgroup sequences, and tetrapod sequences 
were included for comparative purposes. Values on the nodes denote bootstrap support values (above) and Bayesian posterior probabilities (below). 
Genes are color coded according to their time of expression. Genes expressed at the embryonic/larval stages are shown in magenta, genes expressed at 
the juvenile and/or adult stages are shown light blue, and genes expressed across all ontogenetic stages are shown in dark blue. Genes with no record of 
expression and genes from nonactinopterygian vertebrates are shown in gray. The substitution models selected are listed on supplementary table S3, 
Supplementary Material online. 



LA cluster, whereas the eel cathodic Hba is closely related to 
sequences to the MN cluster, suggesting they trace their 
duplicative origin at least to the TGD. Consistent with 
other studies of vertebrate globins (Berenbrink et al. 2005; 
Hoffmann et al. 2010), these results demonstrate that similar 
expression patterns and functional properties in the Hbs of 
distinct lineages may sometimes represent products of con- 
vergent evolution. Although tandemly duplicated globin 
genes often evolve in concert due to interparalog gene con- 
version (Hoffmann et al. 2008a, 2008b; Opazo et al. 2009; 
Runck et al. 2009; Storz et al. 2011), paralogous genes that 



are products of genome duplications (also known as 
"ohnologs") can escape the homogenizing effects of gene 
conversion because they are located on different chromo- 
somes. This is one possible reason why paralogous gene 
copies derived from genome duplications may be more 
likely to diverge in function than tandem gene duplicates. 

Conclusion 

Results of our combined phylogenetic and comparative gen- 
omic analyses indicate that some of the teleost a- and /i-like 
globins are representatives of ancient gene lineages, with 
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duplicative origins that trace back at least to the common 
ancestor of gar and teleost fish, and potentially back to the 
common ancestor of Actinopterygii and Sarcopterygii (super- 
class Osteichthyes). Such a scenario is consistent with the fact 
that Hb multiplicity has also been documented in cartilagin- 
ous fish (Fyhn and Sullivan 1975; Mumm et al. 1978; Weber 
et al. 1983; Calderisi et al. 1996; Dafre and ReischI 1997). Our 
results indicate that the common ancestor of ray-finned fish 
possessed a fairly diverse globin gene repertoire, and in tele- 
osts, this inherited repertoire was further augmented by the 
TGD, which produced dual sets of a- and fi-Wke globin genes 
on two paralogous chromosomes. These TGD-derived gene 
clusters underwent lineage-specific changes in size and mem- 
bership composition, and the MN gene cluster underwent an 
especially high rate of gene turnover. The phylogenetic 
analyses of teleost globins revealed repeated transitions in 
stage-specific expression patterns, demonstrating a surprising 
fluidity in the genetic regulatory control of Hb synthesis 
during development. 

Supplementary Material 

Supplementary tables S1-S4, figure SI, and data file SI are 
available at Molecular Biology and Evolution online (http:// 
www.mbe.oxfordjournals.org/). 
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